Skip to content

Refuse secrets-backend fallback on Execution-API authz deny#66575

Merged
potiuk merged 3 commits into
apache:mainfrom
potiuk:fix/tasksdk-secrets-distinguish-authz-error
May 19, 2026
Merged

Refuse secrets-backend fallback on Execution-API authz deny#66575
potiuk merged 3 commits into
apache:mainfrom
potiuk:fix/tasksdk-secrets-distinguish-authz-error

Conversation

@potiuk
Copy link
Copy Markdown
Member

@potiuk potiuk commented May 7, 2026

Summary

The Execution API's authorization decision was being routed around by the
secrets-backend dispatcher: ExecutionAPISecretsBackend.get_connection /
get_variable returned None on every ErrorResponse, conflating "not
found" with "explicitly denied". And even with the backend patched to raise
on a 401/403, the dispatcher loops in
airflow/sdk/execution_time/context.py, airflow/models/connection.py,
and airflow/models/variable.py catch Exception and continue to the
next backend — so the deny would still have been swallowed silently.

Scope of the gap

In the default worker chain
([EnvironmentVariablesBackend, ExecutionAPISecretsBackend]),
ExecutionAPISecretsBackend is last — there is no later backend for the
dispatcher to fall through to, and the gap does not manifest. The
fall-through-leak scenario requires a non-default configuration: a
reordering, or a custom [secrets] backend placed after
ExecutionAPISecretsBackend. Even there, silently downgrading an
authoritative deny to "next backend, please" is wrong on principle: the
audit calls this a Type C gap — the authz control fires, but its
rejection is treated as a miss and routed around.

Fix (four parts)

1. New ErrorType.PERMISSION_DENIED

Distinct from API_SERVER_ERROR so callers can dispatch on the cause.

2. Client maps 401/403 to PERMISSION_DENIED

ConnectionOperations.get and VariableOperations.get translate the API
server's 401/403 to ErrorResponse(PERMISSION_DENIED, ...) instead of
re-raising as a generic ServerResponseError. 404 still maps to
*_NOT_FOUND; other statuses still raise so the existing
API_SERVER_ERROR translation in handle_requests keeps working.

3. New AirflowSecretsBackendAccessDenied(PermissionError)

Distinct, dispatcher-aware exception class. Subclasses PermissionError
so any existing except PermissionError: handler still matches, but is
narrow enough that the dispatcher can re-raise only this signal
without accidentally promoting an incidental filesystem
OSError-family PermissionError from inside an unrelated backend.

ExecutionAPISecretsBackend (sync + async, connection + variable
variants) raises this on PERMISSION_DENIED. Other ErrorResponse
types (*_NOT_FOUND, transient API_SERVER_ERROR, GENERIC_ERROR)
continue to return None so existing recovery paths keep working.

4. Dispatchers honour the deny

The three task-SDK dispatcher loops in
airflow/sdk/execution_time/context.py (_get_connection,
_async_get_connection, _get_variable) and the two airflow-core
dispatcher loops in airflow/models/connection.py and
airflow/models/variable.py now catch
AirflowSecretsBackendAccessDenied and re-raise it before the
generic except Exception: fall-through.

Tests

  • client.connections.get / client.variables.get return
    ErrorResponse(PERMISSION_DENIED) on 401 and 403 (parametrised).
  • ExecutionAPISecretsBackend.get_connection / get_variable /
    aget_connection / aget_variable raise
    AirflowSecretsBackendAccessDenied when the response is
    PERMISSION_DENIED.
  • End-to-end dispatcher tests in
    TestDispatcherRefusesFallbackOnDeny insert a spy backend AFTER
    ExecutionAPISecretsBackend and assert it is never called once
    the first backend raises the deny — pinning the dispatcher's
    re-raise behaviour, not just the backend's. Covers
    _get_connection, _get_variable, and _async_get_connection.

Reported by

L3 ASVS sweep — apache/tooling-agents#24 (FINDING-017).


Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4.7)

Generated-by: Claude Code (Opus 4.7) following the guidelines

ExecutionAPISecretsBackend.get_connection / get_variable returned None
on every ErrorResponse, conflating "not found" with "explicitly denied".
The secrets-backend dispatcher then fell through to the next backend
(typically EnvironmentVariablesBackend, which performs no authz checks)
on a 401/403 from the Execution API -- letting tasks read secrets the
Execution API had just denied them. The audit calls this a Type C gap:
the authz control fires, but its rejection result is treated as a
miss and routed around.

Three-part fix:

1. New `ErrorType.PERMISSION_DENIED` distinct from `API_SERVER_ERROR`.

2. `ConnectionOperations.get` and `VariableOperations.get` map the
   API server's 401/403 to `ErrorResponse(PERMISSION_DENIED, ...)`
   instead of re-raising as a generic `ServerResponseError`. 404 still
   maps to `*_NOT_FOUND`; other statuses still raise so the existing
   API_SERVER_ERROR translation in `handle_requests` keeps working.

3. `ExecutionAPISecretsBackend` (sync + async, connection + variable
   variants) now raises `PermissionError` on `PERMISSION_DENIED`. The
   surrounding `except Exception:` blocks explicitly re-raise
   `PermissionError` so the secrets-backend dispatcher sees it. NOT_FOUND
   types continue to return `None` (allow fallthrough); other
   ErrorResponses also continue to return `None` (preserve existing
   recovery behaviour for transient errors).

Tests added:

- `client.connections.get` and `client.variables.get` return
  `ErrorResponse(PERMISSION_DENIED)` on 401 and 403 (parametrised).
- `ExecutionAPISecretsBackend.get_connection` / `get_variable` /
  `aget_connection` / `aget_variable` raise `PermissionError` when the
  response is `PERMISSION_DENIED`, with the resource and key in the
  message.

Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-017).
@potiuk potiuk force-pushed the fix/tasksdk-secrets-distinguish-authz-error branch from e522ad9 to 5063192 Compare May 17, 2026 19:30
@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented May 17, 2026

I'd love to get this one merged — and would love it in 3.2.2 if it's not too late. cc @vatsrahul1001 (3.2.2 RM)


Drafted-by: Claude Code (Opus 4.7); reviewed by @potiuk before posting

@potiuk potiuk added this to the Airflow 3.2.2 milestone May 17, 2026
@potiuk potiuk added the backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch label May 17, 2026
@vatsrahul1001 vatsrahul1001 added the ready for maintainer review Set after triaging when all criteria pass. label May 18, 2026
@vatsrahul1001
Copy link
Copy Markdown
Contributor

LGTM! ready for maintainer review

Comment thread task-sdk/tests/task_sdk/execution_time/test_secrets.py Outdated
Comment thread task-sdk/src/airflow/sdk/execution_time/secrets/execution_api.py
Comment thread task-sdk/src/airflow/sdk/execution_time/secrets/execution_api.py
Comment thread task-sdk/tests/task_sdk/execution_time/test_secrets.py
@kaxil
Copy link
Copy Markdown
Member

kaxil commented May 18, 2026

I'd love to get this one merged — and would love it in 3.2.2 if it's not too late. cc @vatsrahul1001 (3.2.2 RM)

Drafted-by: Claude Code (Opus 4.7); reviewed by @potiuk before posting

As-is, it isn't ready, check the comments above

The earlier change made ExecutionAPISecretsBackend raise on 401/403, but
the dispatcher loops in airflow.sdk.execution_time.context and the
airflow-core get_*_from_secrets paths catch Exception and silently fall
through to the next backend — so the deny was still being swallowed.

Introduce AirflowSecretsBackendAccessDenied (subclass of PermissionError)
so the dispatchers can special-case the authoritative deny without
mis-treating an incidental OSError-family PermissionError from inside an
unrelated backend. Patch the three task-SDK dispatcher loops and the two
airflow-core dispatcher loops to re-raise it before the generic except.

Add TestDispatcherRefusesFallbackOnDeny with three end-to-end tests that
insert a spy backend after ExecutionAPISecretsBackend and assert the spy
is never called once the first backend raises the deny — pinning the
dispatcher behaviour, not just the backend's. Also hoist the repeated
imports in test_secrets.py to module top per review feedback.
@potiuk potiuk requested a review from XD-DENG as a code owner May 19, 2026 08:18
@potiuk potiuk removed the ready for maintainer review Set after triaging when all criteria pass. label May 19, 2026
@potiuk potiuk merged commit 2b8c805 into apache:main May 19, 2026
113 checks passed
@potiuk potiuk deleted the fix/tasksdk-secrets-distinguish-authz-error branch May 19, 2026 11:24
@github-actions
Copy link
Copy Markdown
Contributor

Backport successfully created: v3-2-test

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-2-test PR Link

vatsrahul1001 pushed a commit that referenced this pull request May 19, 2026
…ny (#66575) (#67173)

* Refuse secrets-backend fallback on Execution-API authz deny

ExecutionAPISecretsBackend.get_connection / get_variable returned None
on every ErrorResponse, conflating "not found" with "explicitly denied".
The secrets-backend dispatcher then fell through to the next backend
(typically EnvironmentVariablesBackend, which performs no authz checks)
on a 401/403 from the Execution API -- letting tasks read secrets the
Execution API had just denied them. The audit calls this a Type C gap:
the authz control fires, but its rejection result is treated as a
miss and routed around.

Three-part fix:

1. New `ErrorType.PERMISSION_DENIED` distinct from `API_SERVER_ERROR`.

2. `ConnectionOperations.get` and `VariableOperations.get` map the
   API server's 401/403 to `ErrorResponse(PERMISSION_DENIED, ...)`
   instead of re-raising as a generic `ServerResponseError`. 404 still
   maps to `*_NOT_FOUND`; other statuses still raise so the existing
   API_SERVER_ERROR translation in `handle_requests` keeps working.

3. `ExecutionAPISecretsBackend` (sync + async, connection + variable
   variants) now raises `PermissionError` on `PERMISSION_DENIED`. The
   surrounding `except Exception:` blocks explicitly re-raise
   `PermissionError` so the secrets-backend dispatcher sees it. NOT_FOUND
   types continue to return `None` (allow fallthrough); other
   ErrorResponses also continue to return `None` (preserve existing
   recovery behaviour for transient errors).

Tests added:

- `client.connections.get` and `client.variables.get` return
  `ErrorResponse(PERMISSION_DENIED)` on 401 and 403 (parametrised).
- `ExecutionAPISecretsBackend.get_connection` / `get_variable` /
  `aget_connection` / `aget_variable` raise `PermissionError` when the
  response is `PERMISSION_DENIED`, with the resource and key in the
  message.

Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-017).

* Refuse secrets-backend fallback at the dispatcher, not only the backend

The earlier change made ExecutionAPISecretsBackend raise on 401/403, but
the dispatcher loops in airflow.sdk.execution_time.context and the
airflow-core get_*_from_secrets paths catch Exception and silently fall
through to the next backend — so the deny was still being swallowed.

Introduce AirflowSecretsBackendAccessDenied (subclass of PermissionError)
so the dispatchers can special-case the authoritative deny without
mis-treating an incidental OSError-family PermissionError from inside an
unrelated backend. Patch the three task-SDK dispatcher loops and the two
airflow-core dispatcher loops to re-raise it before the generic except.

Add TestDispatcherRefusesFallbackOnDeny with three end-to-end tests that
insert a spy backend after ExecutionAPISecretsBackend and assert the spy
is never called once the first backend raises the deny — pinning the
dispatcher behaviour, not just the backend's. Also hoist the repeated
imports in test_secrets.py to module top per review feedback.

* Hoist AirflowSecretsBackendAccessDenied imports to module top
(cherry picked from commit 2b8c805)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
vatsrahul1001 pushed a commit that referenced this pull request May 20, 2026
…ny (#66575) (#67173)

* Refuse secrets-backend fallback on Execution-API authz deny

ExecutionAPISecretsBackend.get_connection / get_variable returned None
on every ErrorResponse, conflating "not found" with "explicitly denied".
The secrets-backend dispatcher then fell through to the next backend
(typically EnvironmentVariablesBackend, which performs no authz checks)
on a 401/403 from the Execution API -- letting tasks read secrets the
Execution API had just denied them. The audit calls this a Type C gap:
the authz control fires, but its rejection result is treated as a
miss and routed around.

Three-part fix:

1. New `ErrorType.PERMISSION_DENIED` distinct from `API_SERVER_ERROR`.

2. `ConnectionOperations.get` and `VariableOperations.get` map the
   API server's 401/403 to `ErrorResponse(PERMISSION_DENIED, ...)`
   instead of re-raising as a generic `ServerResponseError`. 404 still
   maps to `*_NOT_FOUND`; other statuses still raise so the existing
   API_SERVER_ERROR translation in `handle_requests` keeps working.

3. `ExecutionAPISecretsBackend` (sync + async, connection + variable
   variants) now raises `PermissionError` on `PERMISSION_DENIED`. The
   surrounding `except Exception:` blocks explicitly re-raise
   `PermissionError` so the secrets-backend dispatcher sees it. NOT_FOUND
   types continue to return `None` (allow fallthrough); other
   ErrorResponses also continue to return `None` (preserve existing
   recovery behaviour for transient errors).

Tests added:

- `client.connections.get` and `client.variables.get` return
  `ErrorResponse(PERMISSION_DENIED)` on 401 and 403 (parametrised).
- `ExecutionAPISecretsBackend.get_connection` / `get_variable` /
  `aget_connection` / `aget_variable` raise `PermissionError` when the
  response is `PERMISSION_DENIED`, with the resource and key in the
  message.

Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-017).

* Refuse secrets-backend fallback at the dispatcher, not only the backend

The earlier change made ExecutionAPISecretsBackend raise on 401/403, but
the dispatcher loops in airflow.sdk.execution_time.context and the
airflow-core get_*_from_secrets paths catch Exception and silently fall
through to the next backend — so the deny was still being swallowed.

Introduce AirflowSecretsBackendAccessDenied (subclass of PermissionError)
so the dispatchers can special-case the authoritative deny without
mis-treating an incidental OSError-family PermissionError from inside an
unrelated backend. Patch the three task-SDK dispatcher loops and the two
airflow-core dispatcher loops to re-raise it before the generic except.

Add TestDispatcherRefusesFallbackOnDeny with three end-to-end tests that
insert a spy backend after ExecutionAPISecretsBackend and assert the spy
is never called once the first backend raises the deny — pinning the
dispatcher behaviour, not just the backend's. Also hoist the repeated
imports in test_secrets.py to module top per review feedback.

* Hoist AirflowSecretsBackendAccessDenied imports to module top
(cherry picked from commit 2b8c805)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
vatsrahul1001 pushed a commit that referenced this pull request May 20, 2026
…ny (#66575) (#67173)

* Refuse secrets-backend fallback on Execution-API authz deny

ExecutionAPISecretsBackend.get_connection / get_variable returned None
on every ErrorResponse, conflating "not found" with "explicitly denied".
The secrets-backend dispatcher then fell through to the next backend
(typically EnvironmentVariablesBackend, which performs no authz checks)
on a 401/403 from the Execution API -- letting tasks read secrets the
Execution API had just denied them. The audit calls this a Type C gap:
the authz control fires, but its rejection result is treated as a
miss and routed around.

Three-part fix:

1. New `ErrorType.PERMISSION_DENIED` distinct from `API_SERVER_ERROR`.

2. `ConnectionOperations.get` and `VariableOperations.get` map the
   API server's 401/403 to `ErrorResponse(PERMISSION_DENIED, ...)`
   instead of re-raising as a generic `ServerResponseError`. 404 still
   maps to `*_NOT_FOUND`; other statuses still raise so the existing
   API_SERVER_ERROR translation in `handle_requests` keeps working.

3. `ExecutionAPISecretsBackend` (sync + async, connection + variable
   variants) now raises `PermissionError` on `PERMISSION_DENIED`. The
   surrounding `except Exception:` blocks explicitly re-raise
   `PermissionError` so the secrets-backend dispatcher sees it. NOT_FOUND
   types continue to return `None` (allow fallthrough); other
   ErrorResponses also continue to return `None` (preserve existing
   recovery behaviour for transient errors).

Tests added:

- `client.connections.get` and `client.variables.get` return
  `ErrorResponse(PERMISSION_DENIED)` on 401 and 403 (parametrised).
- `ExecutionAPISecretsBackend.get_connection` / `get_variable` /
  `aget_connection` / `aget_variable` raise `PermissionError` when the
  response is `PERMISSION_DENIED`, with the resource and key in the
  message.

Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-017).

* Refuse secrets-backend fallback at the dispatcher, not only the backend

The earlier change made ExecutionAPISecretsBackend raise on 401/403, but
the dispatcher loops in airflow.sdk.execution_time.context and the
airflow-core get_*_from_secrets paths catch Exception and silently fall
through to the next backend — so the deny was still being swallowed.

Introduce AirflowSecretsBackendAccessDenied (subclass of PermissionError)
so the dispatchers can special-case the authoritative deny without
mis-treating an incidental OSError-family PermissionError from inside an
unrelated backend. Patch the three task-SDK dispatcher loops and the two
airflow-core dispatcher loops to re-raise it before the generic except.

Add TestDispatcherRefusesFallbackOnDeny with three end-to-end tests that
insert a spy backend after ExecutionAPISecretsBackend and assert the spy
is never called once the first backend raises the deny — pinning the
dispatcher behaviour, not just the backend's. Also hoist the repeated
imports in test_secrets.py to module top per review feedback.

* Hoist AirflowSecretsBackendAccessDenied imports to module top
(cherry picked from commit 2b8c805)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
vatsrahul1001 pushed a commit that referenced this pull request May 21, 2026
…ny (#66575) (#67173)

* Refuse secrets-backend fallback on Execution-API authz deny

ExecutionAPISecretsBackend.get_connection / get_variable returned None
on every ErrorResponse, conflating "not found" with "explicitly denied".
The secrets-backend dispatcher then fell through to the next backend
(typically EnvironmentVariablesBackend, which performs no authz checks)
on a 401/403 from the Execution API -- letting tasks read secrets the
Execution API had just denied them. The audit calls this a Type C gap:
the authz control fires, but its rejection result is treated as a
miss and routed around.

Three-part fix:

1. New `ErrorType.PERMISSION_DENIED` distinct from `API_SERVER_ERROR`.

2. `ConnectionOperations.get` and `VariableOperations.get` map the
   API server's 401/403 to `ErrorResponse(PERMISSION_DENIED, ...)`
   instead of re-raising as a generic `ServerResponseError`. 404 still
   maps to `*_NOT_FOUND`; other statuses still raise so the existing
   API_SERVER_ERROR translation in `handle_requests` keeps working.

3. `ExecutionAPISecretsBackend` (sync + async, connection + variable
   variants) now raises `PermissionError` on `PERMISSION_DENIED`. The
   surrounding `except Exception:` blocks explicitly re-raise
   `PermissionError` so the secrets-backend dispatcher sees it. NOT_FOUND
   types continue to return `None` (allow fallthrough); other
   ErrorResponses also continue to return `None` (preserve existing
   recovery behaviour for transient errors).

Tests added:

- `client.connections.get` and `client.variables.get` return
  `ErrorResponse(PERMISSION_DENIED)` on 401 and 403 (parametrised).
- `ExecutionAPISecretsBackend.get_connection` / `get_variable` /
  `aget_connection` / `aget_variable` raise `PermissionError` when the
  response is `PERMISSION_DENIED`, with the resource and key in the
  message.

Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-017).

* Refuse secrets-backend fallback at the dispatcher, not only the backend

The earlier change made ExecutionAPISecretsBackend raise on 401/403, but
the dispatcher loops in airflow.sdk.execution_time.context and the
airflow-core get_*_from_secrets paths catch Exception and silently fall
through to the next backend — so the deny was still being swallowed.

Introduce AirflowSecretsBackendAccessDenied (subclass of PermissionError)
so the dispatchers can special-case the authoritative deny without
mis-treating an incidental OSError-family PermissionError from inside an
unrelated backend. Patch the three task-SDK dispatcher loops and the two
airflow-core dispatcher loops to re-raise it before the generic except.

Add TestDispatcherRefusesFallbackOnDeny with three end-to-end tests that
insert a spy backend after ExecutionAPISecretsBackend and assert the spy
is never called once the first backend raises the deny — pinning the
dispatcher behaviour, not just the backend's. Also hoist the repeated
imports in test_secrets.py to module top per review feedback.

* Hoist AirflowSecretsBackendAccessDenied imports to module top
(cherry picked from commit 2b8c805)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:task-sdk backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants